Behavior Research Methods
○ Springer Science and Business Media LLC
Preprints posted in the last 30 days, ranked by how well they match Behavior Research Methods's content profile, based on 25 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.
Maracia, B. C. B.; Souza, T. R.; Oliveira, G. S.; Nunes, J. B. P.; dos Santos, C. E. S.; Peixoto, C. B.; Lopes-Silva, J. B.; Nobrega, L. A. O. d. A.; Araujo, P. A. d.; Souza, R. P.; Souza, B. R.
Show abstract
Dance is a core form of human-environment interaction and a powerful medium for emotional expression, yet dancers are routinely exposed to environmental affective cues that may shape their movement. We tested whether a negative emotional context induced immediately before improvisation alters dance biomechanics. Twenty professional dancers performed two 3-min improvised dances. Between dances, they viewed either Neutral or Negatively valenced pictures from the International Affective Picture System (IAPS; 2 min 40 s, 5 s per image). Eye tracking verified attention to the visual stream. Mood was assessed at four time points (PT1-PT4) using the Brazilian Mood Scale (BRAMS), and full-body, three-dimensional kinematics were captured at 300 Hz using a 9-camera optoelectronic system (Qualisys) and processed to measure global movement amplitude and expansion. Negative IAPS exposure increased tension, depression, fatigue, and decreased vigor from PT2 to PT3. Biomechanically, the Negative Stimulus dancers showed a significant reduction in global movement amplitude after negative IAPS exposure, with reduced movement amplitude of the body extremities. In contrast, global movement expansion remained unchanged; that is, the extremities were not positioned closer or farther from the pelvis. Neutral images produced no mood change and no measurable modulation of movement amplitude or expansion. Together, these results support the hypothesis that improvised dance carries biomechanical signatures of the dancers current affective state, beyond the intended expressive content, and provide an automated motion-capture workflow for studying emotion-movement coupling in spontaneous dance. HighlightsNegative visual context shifted dancers mood toward negative affect Negative images reduced movement amplitude in improvised dance Movement expansion remained stable despite mood induction Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=113 SRC="FIGDIR/small/711707v1_ufig1.gif" ALT="Figure 1"> View larger version (19K): org.highwire.dtl.DTLVardef@aeaacdorg.highwire.dtl.DTLVardef@14f9bf5org.highwire.dtl.DTLVardef@18805fcorg.highwire.dtl.DTLVardef@1411256_HPS_FORMAT_FIGEXP M_FIG C_FIG
Stowell, D.; Nolasco, I.; McEwen, B.; Vidana Vila, E.; Jean-Labadye, L.; Benhamadi, Y.; Lostanlen, V.; Dubus, G.; Hoffman, B.; Linhart, P.; Morandi, I.; Cazau, D.; White, E.; White, P.; Miller, B.; Nguyen Hong Duc, P.; Schall, E.; Parcerisas, C.; Gros-Martial, A.; Moummad, I.
Show abstract
Computational bioacoustics has seen significant advances in recent decades. However, the rate of insights from automated analysis of bioacoustic audio lags behind our rate of collecting the data - due to key capacity constraints in data annotation and bioacoustic algorithm development. Gaps in analysis methodology persist: not because they are intractable, but because of resource limitations in the bioacoustics community. To bridge these gaps, we advocate the open science method of data challenges, structured as public contests. We conducted a bioacoustics data challenge named BioDCASE, within the format of an existing event (DCASE). In this work we report on the procedures needed to select and then conduct useful bioacoustics data challenges. We consider aspects of task design such as dataset curation, annotation, and evaluation metrics. We report the three tasks included in BioDCASE 2025 and the resulting progress made. Based on this we make recommendations for open community initiatives in computational bioacoustics.
Langford, J.; Chua, J. Y.; Long, I.; Williams, A. C.; Hillsdon, M.
Show abstract
The increasing use of accelerometers as digital health technologies in clinical trials and clinical care is driving the need for data processing to meet medical standards. The aim of this study was to create and test a modular pipeline for the pre-processing of high-resolution accelerometry that assures the quality, transparency and traceability of digital measures from sensor-level data. The objective is for the pipeline to be a foundational layer in the development, implementation and comparison of measures. The study developed the open GENEAcore package to meet the requirements of regulators, verifying the engineering implementation and analytically validating outputs against reference datasets. Early stages included the optimisation of calibration and non-wear detection. Data-driven detection of behavioural transitions was then validated to give direct bout outputs without the need to identify rules for epoch aggregation and interruptions. The utility for measure development was shown by comparing two algorithms for the characterisation of activity intensity in both the epoch and bout paradigms. Non-wear was detected with a balanced accuracy of 92.3% and the commonly used 13mg acceleration standard deviation threshold was empirically validated for the first time. The detection of transitions proved reliable with 99% detected, on average, within 2 seconds of their occurrence to give a mean expected event duration of 68.6s from a log-normal distribution. The different activity intensity algorithms were more than 99% concordant during movement but their outputs diverged in low movement conditions. Importantly, variable duration bouts created 31% higher daily activity durations compared to 1-second epochs. This evaluation of pre-processing steps has confirmed the attention to detail required to create robust and reproducible results for later clinical validation where small changes in an algorithm or its implementation may have clinically meaningful consequences.
Huffman, D. J.; Annes, P. J.; Gowda, C.; Colina, L.
Show abstract
Spatial navigation could theoretically serve as an early neurobehavioral marker of Alzheimers disease risk, yet technological limitations have hindered its widespread adoption. We leveraged breakthroughs in technology to create a custom smartphone application to compare real-world spatial memory with lab-based measures. Specifically, we compared performance across two established lab-based tasks, judgments of relative direction (JRD) and map drawing, and our novel app-based, in situ pointing task administered in a familiar large-scale, real-world environment. Young adults completed both laboratory and mobile navigation tasks, allowing within-subject comparisons across modalities. JRD performance strongly correlated with map drawing performance. In contrast, App-based pointing showed lower error and reduced inter-individual variability relative to JRD performance, but weak correlations with lab-based measures. We also developed a novel analytical technique in which we transformed the app-based pointing into a relational, JRD-like metric, and we observed strong correlations and correlated patterns of errors across all tasks. Thus, real-world, app-based pointing captures stable directional performance (e.g., as indexed by the lower errors and lower variability relative to the JRD Task) and, when expressed in a common framework, correlates with laboratory measures of spatial memory, thus suggesting that these tasks tap into partially overlapping cognitive representations. These results provide a pivotal advancement to our understanding of both shared and unique variance across spatial memory paradigms, and support the use and further development of mobile navigation tools as scalable complements to lab-based assessments for studying spatial cognition and its decline in preclinical and clinical stages of Alzheimers disease. HighlightsO_LISpatial memory is a core cognitive function and is impaired in Alzheimers disease C_LIO_LITesting memory in large-scale, real-world environments enhances ecological validity C_LIO_LIWe compared performance of our novel real-world measure with lab measures C_LIO_LIWe observed strong correlations between the lab-based measures C_LIO_LIWe observed shared and unique variance between lab- and real-world measures C_LI
Marques, D.; Barbosa-Morais, N. L.; Reis, C. C. P.
Show abstract
Actigraphy is a non-invasive and cost-effective method for monitoring behavioral rhythms under real-world conditions by collecting time-resolved measurements of locomotor activity, light exposure, and temperature. Although several open-source packages support specific aspects of actigraphy analysis, aspects such as preprocessing, metric calculation, and mathematical modeling are often distributed across separate software packages, limiting interoperability and increasing programming overhead. Here we introduce circStudio, a Python package that unifies actigraphy data processing and mathematical modeling of circadian rhythms within a single framework. Built from the pyActigraphy codebase and integrating circadian models from the Arcascope circadian package, circStudio provides flexible preprocessing tools, support for multiple actigraphy file formats through adaptor classes, standalone functions for computing commonly used actigraphy metrics, and implementations of several mathematical models of circadian rhythms. The package enables users to move efficiently from raw wearable data to physiologically interpretable circadian outputs. Ultimately, circStudio aims to facilitate reproducible workflows and to provide a flexible foundation for research applications across circadian biology, sleep science, and digital health.
Gargano, J. A.; Rice, A.; Chari, D. A.; Parrell, B.; Lammert, A. C.
Show abstract
Reverse correlation is a widely-used and well-established method for probing latent perceptual representations in which subjects render subjective preference responses to ambiguous stimuli. Stimuli are purposefully designed to have no direct relationship with the target representation (e.g., they are randomly-generated), a property which makes each individual stimulus minimally informative toward reconstructing the target, and often difficult to interpret for subjects. As a result, a large number of stimulus-response pairs must be gathered from a given subject in order for reconstructions to be of sufficient quality, making the task fatiguing. Recent work has demonstrated that the number of trials needed can be substantially reduced using a compressive sensing framework that incorporates the assumption that the target representation can be sparsely represented in some basis into the reconstruction process. Here, we introduce an alternative method that incorporates the sparsity assumption directly into stimulus generation, which holds promise not only for improving efficiency, but also for improving the interpretability of stimuli from subjects perspective. We develop this new method as a mathematical variation of the compressive sensing approach, before conducting one simulation study and two human subjects experiments to assess the benefits of this method to reconstruction quality, sample size efficiency, and subjective interpretability. Results show that sparse stimulus generation improves all three of these areas relative to conventional reverse correlation approaches, and also relative to compressive sensing in most conditions.
Amthor, L. I.; Bruengger, O.; Buehler, M.; Monn, A.; Provaznikova, B.; Kronenberg, G.; Olbrich, S.; Welt, T.
Show abstract
BackgroundAutonomous sensory meridian response (ASMR) and music-induced frisson are sensory-affective phenomena characterized by tingling, chills, and pronounced emotional responses. Previous research has mainly focused on physiological changes during these experiences, whereas much less is known about whether baseline physiological state is associated with subsequent susceptibility. ObjectiveTo examine whether baseline autonomic flexibility, indexed primarily by heart rate variability (HRV), is associated with later ASMR/frisson responsiveness. Resting EEG measures were included as secondary exploratory markers. MethodsFifteen participants were recruited by convenience sampling; after artifact-based exclusion, 10 participants were included in the analyses. A 5-minute resting baseline EEG and ECG was recorded prior to stimulus presentation. Participants were then exposed to auditory and audiovisual ASMR stimuli, classical music excerpts, and a control stimulus, and reported whether they had experienced ASMR-typical sensations or frisson. Main analyses examined associations between baseline physiological parameters and a combined response-positive outcome. Exploratory analyses included participant-level correlations, comparisons between susceptible and non-susceptible participants, and stimulus-specific effect sizes. ResultsHRV-related measures showed the clearest and most consistent pattern of association with responsiveness. Higher baseline total HRV power was associated with a greater number of response-positive stimuli (r = 0.756, p = 0.011), with similar positive associations for high-frequency HRV (HF; r = 0.672, p = 0.033) and baseline heart rate slope (r = 0.751, p = 0.012). Stimulus-specific analyses likewise showed the most consistent positive baseline effects for total HRV power, with HF and heart rate slope pointing in the same direction. Frontal alpha asymmetry (FAA) was negatively associated with responsiveness ({rho} = -0.862, p = 0.001), but EEG findings overall were less consistent than the HRV-related pattern and are best interpreted as secondary exploratory observations. ConclusionsIn this exploratory pilot sample, baseline HRV, particularly total HRV power, showed the most coherent physiological association with susceptibility to ASMR and music-induced frisson. The findings are consistent with the possibility that these experiences depend not only on stimulus properties, but also on pre-existing physiological state. Given the small sample and exploratory design, the results should be interpreted as hypothesis-generating and require replication in larger confirmatory studies.
Tam, S. K. E.; Xiao, X.; Cheng, X.; Kwok, S. C.; Becker, B.
Show abstract
Background and aimsPerseverative behaviours are commonly assessed using operant paradigms in which rodents work for drugs or food under physiological deprivation, limiting translational relevance to some behavioural addictions. Here we validated an operant paradigm in which the acquired behaviour is driven neither by physiological needs nor hedonic responses. MethodsMice were trained to lever-press for green light. Exp.1 used a within-subjects design to examine lever discrimination and whether responding could be "satiated" by light preexposure. Exp.2 examined instrumental contingency using a between-subjects design, with light delivery equated between contingent and non-contingent groups. Exp.3 replaced green light with dim red light producing less retinal photoreceptor excitation but comparable heat to assess non-photic cues. Exp.4 examined whether green light could affect food seeking different motivational states. ResultsIn Exp.1, green light supported lever discrimination. Among high responders, the satiation effect was modest (<15% reduction) and did not deter lever pressing. In Exp.2, instrumental contingency promoted response acquisition whereas random light delivery did not. In Exp.3, dim red light failed to sustain behaviour, producing [~]50% response decrement. In Exp.4, light potentiated food seeking under ad libitum feeding. Discussion and conclusionsResponse-contingent light serves as a reward to establish operant responding, which cannot be explained by alerting effects or thermal cues. Our study bridges the gap between animal models and findings from humans that coloured light may exacerbate smartphone use and that light therapy may reshape reward circuits in individuals with Internet gaming disorder symptoms [Li et al. (2026) Advanced Science 13:e14044].
Chowdhury, A.; Irtiza, A.
Show abstract
Background: The urgent care departments in Europe face a structural paradox: accelerating digitalisation is accompanied by a patient population that is disproportionately unable to engage with standard digital tools. An internal analysis at the Emergency Department (Akutafdelingen) of Nordsjaellands Hospital in Hilleroed, Denmark found that 43% of emergency patients struggle with digital solutions - a figure that reflects the predictable composition of acute care populations rather than any individual failing. Objective: This paper presents the design, iterative development, and secondary validation of the ED Adaptive Interface (v5): a prototype adaptive patient terminal developed in response to this challenge. The system operationalises what the author terms impairment-first design - a methodology that treats the most constrained patient experience as the primary design problem and derives the standard experience as a subset. The interface configures itself in under ten seconds via nurse-led setup, adapting across four axes of impairment: visual, motor, speech, and cognitive. System: Version 4 supports five accessibility modes, a heatmap pain assessment grid, a Privacy and Dignity panel, a live workflow tracker with care notifications, structured dual-category help requests, and plain-language medical term definitions across four languages. Version 5, reported here for the first time, introduces a Condition Worsening Escalation button, a Referral Pathway Display, a "Why Am I Waiting?" triage explainer, a Symptom Progression Log, MinSP/Yellow Card Scan simulation, expanded language support (seven languages: English, Danish, Arabic with full RTL layout, Turkish, Romanian, Polish, and Somali), and an expanded ten-item Communication Board. The entire system runs as a single 79-kilobyte HTML file with zero infrastructure requirements. Methods: To base the design on patient-generated evidence, two independent social media threads were subjected to an inductive thematic analysis (Braun and Clarke, 2006): a primary corpus of 83 entries in the Facebook group Foreigners in Denmark (collected March 2026) and a corroborating corpus in an international community group in the Aarhus region (collected April 2026). All identifiers in both datasets were fully anonymised under GDPR Article 89 research provisions prior to analysis. No participants were contacted. Generative AI tools were used to assist with drafting, writing, and prototype code development; all scientific content, data collection, analysis, and conclusions are the sole responsibility of the authors. Results: The first discourse corpus produced five major themes corresponding to the five problem areas the prototype was designed to address: system navigation and triage literacy gaps (31 entries); language and cultural barriers (6 entries); communication failures during care (5 entries); staff overload and capacity constraints (8 entries); and pain and severity assessment failures (14 entries). The corroborating dataset supported all five themes and introduced two additional themes: differential treatment of international patients and medical gaslighting as a long-term pattern of patient advocacy failure. One structural finding - the five most-liked comments incorrectly criticised the original poster for self-referring when she had received explicit 1813 telephone triage approval - directly inspired the Referral Pathway Display and "Why Am I Waiting?" features in v5. Conclusions: The convergence of design rationale and independent social evidence across all five problem categories suggests that impairment-first design is not a niche accessibility concern but a structural approach to healthcare interface quality. The prototype is ready for a structured clinical pilot using the System Usability Scale (SUS) and semi-structured staff interviews. The long-term roadmap includes full MinSP integration, hospital PMS connectivity, and clinical validation.
Henley, K. Y.; Bozeman, A. L.; Pat, B. M.; Floyd, C. L.
Show abstract
The use of domestic pigs in clinical training and biomedical research is expanding rapidly, increasing the need for reliable, noninvasive indicators of health and welfare. Vocal analysis offers a non-invasive promising tool, yet the acoustic repertoire of adult domestic pigs remains poorly defined. However, the vocalization repertoire of adult domestic pigs has yet to be characterized. This study characterizes the vocal repertoire of adult pigs housed in a biomedical research laboratory. Twelve mixed-breed pigs (2-3 months old; 5 males, 7 females) were recorded during routine husbandry and experimental procedures. Vocal classification was conducted using perceptual and objective clustering techniques. First, aural- visual (AV) inspection of spectrograms was used to construct a hierarchical repertoire. Second, a two-step cluster analysis based on six acoustic parameters (5% frequency, first quartile frequency, center frequency, 90% bandwidth, interquartile range bandwidth, and 90% duration) provided an objective classification. Agreement between methods was evaluated using Cramers V. A total of 1,136 vocalizations from 69 recordings were analyzed. AV classification revealed five major vocal classes-- grunt, squeal, complex, scream, and bark--subdividing into 16 distinct call types. Standardized definitions integrating descriptive and quantitative criteria are provided. The two-step cluster analysis identified two clusters as the optimal statistical solution, with moderate agreement between methods (Cramers V = 0.67, p < 0.0001). Most AV-defined call types aligned with previously reported repertoires, although whines, yelps, and stable screams were unique to this study. While two-cluster solutions are commonly reported, our findings indicate that richer acoustic structure exists and that high gradation among pig calls may limit the resolution of statistical clustering. These results establish a detailed acoustic framework for adult pig vocalizations and provide essential groundwork for developing predictive models to enhance welfare assessment and support comparative research in laboratory-housed pigs.
Ceolini, E.; Band, G.; Ghosh, A.
Show abstract
Fine-grained temporal structures emerge in smartphone behavioral recordings over multi-day periods. Complex systems research suggests that emergent temporal structures reflect underlying resource constraints of the system. Here we test whether cognitive abilities measured through speeded tasks (spanning fractions of a second) are reflected in emergent smartphone temporal structures spanning days, revealing how cognitive resource limitations shape naturalistic behavior. We analyzed smartphone tap interval patterns accumulated over several days and used decision tree regression models to predict performance in simple and choice reaction time tasks from these patterns. Simple reaction time was poorly predicted (R2 = 0.003), indicating that basic sensorimotor constraints play only a marginal role in shaping real-world behavioral timing. In contrast, choice reaction time was moderately predictable (R2 = 0.4), demonstrating that higher-order cognitive constraints prominently influence naturalistic temporal organization. Notably, while task performance operates at sub-second timescales, predictive temporal patterns in smartphone behavior spanned milliseconds to several seconds and was accumulated over days, revealing the broad, multi-scale influence of cognitive resource constraints on everyday behavior. Both predicted and measured choice reaction times showed age-related decline, but the decline was more pronounced in predicted values, suggesting that age-related cognitive changes may be amplified in naturalistic contexts. These findings demonstrate that emergent temporal structures in smartphone use can reveal how cognitive processes measured using speeded tasks manifest, or fail to manifest, in real-world behavior. These findings demonstrate that complex-systems approaches can bridge laboratory and naturalistic assessments of cognition, revealing which cognitive processes meaningfully constrain real-world behavior.
Noerenberg, W.; Schweitzer, R.; Rolfs, M.
Show abstract
Saccadic eye movements sweep the visual scene across the retina, yet the resulting motion is rarely perceived. Visual factors alone, such as the presence of static pre- and post-saccadic images, can attenuate motion perception, suggesting a masking of the motion signal during early visual processing. Here, we isolated the visual component of this reduction in motion perception using simulated saccades presented to fixating observers. Across two experiments, we manipulated motion amplitude (6-18 dva), duration, and velocity profile and measured perceived amplitude and velocity at varying masking durations. Visual masking strongly reduced perceived motion amplitude and velocity, with short halftimes ([~]15 ms) that were largely invariant across saccade amplitudes. Critically, motion following a naturalistic saccadic velocity profile was perceived as smaller and slower than constant-velocity motion matched in amplitude and duration, even without explicit masking. This additional reduction increased with both amplitude and duration. These results show that visual mechanisms alone can account for substantial motion reduction across a large range of amplitudes and demonstrate a partially separable contribution of the saccadic velocity profile, suggesting that the temporal structure of retinal motion itself supports perceptual continuity across eye movements.
Horvath, G.; Rado, J.; Czigler, A.; Fülöp, D.; Sari, Z.; Kovacs, I.; Buzas, P.; Jando, G.
Show abstract
Binocular vision depends on the integration of matching visual features across the two eyes, while conflicting interocular signals can engage active inhibitory processes in the visual system. To investigate the temporal dynamics of these putative inhibitory processes, we examined how transitions between different binocular correlation states influence perceptual detectability and response speed. Using dynamic random-dot correlograms - free of monocular cues and allowing precise interocular manipulation - we presented brief target intervals embedded in longer background sequences. Stimuli varied in binocular correlation: correlated (C) patterns contained identical luminance profiles in both eyes, anticorrelated (A) patterns had inverted luminance dots, and uncorrelated (U) patterns had independent dot arrangements. Across three experiments, we measured (1) the presentation duration threshold required to detect a change in correlation, (2) simple reaction times (RTs) to the same transitions at suprathreshold levels, and (3) psychometric functions across durations for selected transitions. In Experiment 1, A[->]C transitions yielded significantly higher duration thresholds than C[->]A, indicating a suppressive influence associated with prior anticorrelation. In contrast, Experiment 2 showed that A[->]C transitions produced the shortest RTs, while C[->]U transitions were slowest, suggesting a rebound-like facilitation following prior suppression. Experiment 3 confirmed these temporal and contrast dependences, with opposite changes in contrast threshold and reaction times between transitions toward and away from the correlated fusional states. This divergence between perceptual onset and reaction time is consistent with a two-phase account in which binocular anticorrelation is associated with an initial suppressive phase followed by rebound-like facilitation that accelerates responses once the target becomes detectable. These findings are consistent with current models of binocular rivalry and fusion, and provide a temporally resolved behavioral perspective on how inhibitory control in sensory systems may dynamically influence subsequent responsiveness under conditions of perceptual ambiguity.
Raviv, H.; Hasenfratz, L.; Gousios, K.; Faryna, M.; Beaty, R.; Johnson, D.; Chen, B.; Altenhof, A.; Ryan, B.; Greenberg, C. A.; Hong, Z.; Assayag, G.; Tsyhanov, A.; Malakhov, V.; Rosenwein, T.; Raviv, O.; Lew-Williams, C.; Hasson, U.
Show abstract
Human development unfolds in continuous, multimodal environments across seconds, days, and years, yet most developmental datasets capture sparse, context-limited samples of everyday life. We introduce the First 1,000 Days (1kD) Project, an initiative designed to collect ultra-dense, longitudinal, child-centered data that capture developmental trajectories within their full ecological context. Fifteen U.S. homes with 17 infants were recorded 12-14 hours per day over a median of 944 days, yielding [~]1.18 million hours of raw audiovisual data. We present an end-to-end framework for large-scale longitudinal naturalistic measurement and a scalable analysis pipeline of the collected data. In a case study, we describe how we utilized our pipeline to isolate child-centered speech, resulting in the collection of 2,000 to 6,000 hours of transcribed speech for each infant. We demonstrate that dense sampling within the home environment reveals a stable, household-specific lexical structure, which sparse sampling methods consistently fail to capture. The 1kD project offers a blueprint for teams aiming to collect and analyze natural behavior at scale in real-world settings.
Debnath, A.; Sarkar, S.
Show abstract
BackgroundAlzheimers disease (AD) causes progressive decline in language and cognition. Automated speech analysis has emerged as a promising screening tool, yet clinical data scarcity limits progress. To address this, we generated a large-scale simulated speech dataset to model linguistic and acoustic deterioration across cognitive stages, Control, Mild Cognitive Impairment (MCI), and AD. MethodsUsing Monte Carlo simulations, we emulated the Pitt DementiaBank "Cookie Theft" narratives. Acoustic features (speech rate, pause duration, jitter, shimmer) and linguistic features (type-token ratio, unique-word count, filler usage) were synthetically sampled from real-world DementiaBank distributions. We trained an XGBoost classifier to distinguish diagnostic groups, and applied SHAP (Shapley Additive exPlanations) to assess feature importance. ResultsThe model achieved high discriminative performance (AUC {approx} 0.94; accuracy {approx} 85%). Compared to controls, simulated MCI and AD groups showed progressive declines in fluency and lexical diversity, and increases in disfluencies and voice instability. SHAP analysis revealed that key predictors included reduced type-token ratio, higher pause and filler rates, and elevated jitter/shimmer. Classification was most accurate for Control vs. AD; MCI misclassifications highlighted intermediate profiles. InterpretationOur framework, FMN (Forget Me Not), captures clinically relevant speech changes using simulated data, offering an explainable and scalable approach for cognitive screening. While not a substitute for real datasets, FMN validates a pipeline that mirrors known AD markers and can guide future real-world deployments. External validation remains a key next step for translational impact.
Figarola, V.; Liang, W.; Luthra, S.; Parker, E.; Winn, M.; Brown, C.; Shinn-Cunningham, B. G.
Show abstract
Listeners face many challenges when trying to maintain attention to a target source in everyday settings; for instance, reverberation distorts acoustic cues and interruptions capture attention. However, little is known about how these challenges affect the ability to maintain selective attention. Here, we measured syllable recall accuracy and pupil dilation during a spatial selective attention task that was sometimes disrupted. Participants heard two competing, temporally interleaved syllable streams presented in pseudo-anechoic or reverberant environments. On randomly selected trials, a sudden interruption occurred mid-sequence. Compared to anechoic trials, reverberant performance was worse overall, and the interrupter disrupted performance. In uninterrupted trials, reverberation reduced peak pupil dilation both when it was consistent across all stimuli in a block and when it was randomized trial to trial, suggesting temporal smearing reduced clarity of the scene and the salience of events in the ongoing streams. Pupil dilations in response to interruptions indicated perceptual salience was strong across reverberant and anechoic conditions. Specifically, baseline pupil size before trials did not vary across room conditions, and mixing or blocking of trials (altering stimulus expectations) had no impact on pupillary responses. Together, these findings highlight that stimulus salience drives cognitive load more strongly than does task performance.
Altinordu, N.; Boynton, G. M.; Fine, I.
Show abstract
Color is a prominent feature of visual experience, yet humans can recognize objects easily and accurately from grayscale images. We examined whether color becomes more useful when spatial information is degraded due to blurring. Participants viewed naturalistic scenes in color or grayscale, and reported whether a named target object was present across a range of blur levels that simulated optical defocus from 0-8 diopters. With unblurred images, performance did not differ between color and grayscale conditions, but as blur increased, recognition accuracy declined. Color provided a modest but reliable advantage at higher levels of blur, suggesting that color becomes increasingly useful when optical quality is degraded. We hypothesize that the evolutionary shift towards trichromacy may have been partially driven by the need to compensate for optical degradation due to aging and/or accumulated light exposure.
Marchesano, M.; Silva, A. C.; Tassino, B.
Show abstract
Both active movement profiles and robust circadian rhythms are linked to improved health outcomes, yet the underlying mechanisms remain partially understood. We investigated this relationship in young adults (n = 169, aged 18-30 years) under real-world conditions using actigraphy data. We performed k-means clustering on 12 accelerometer-based features capturing magnitude, duration, frequency, and intensity distribution to derive movement behavior profiles. As a proxy of circadian rhythms integrity we computed the Circadian Function Index (CFI), which combines intradaily variability, interdaily stability, and relative amplitude. We also assessed circadian phase and sleep quality parameters. Additionally, we quantified light exposure and physical activity over 3-hour daily intervals. The unsupervised algorithm identified two non-overlapping profiles among participants, the More Active (MA) and the Less Active (LA) profiles. MA exhibited a higher CFI (0.81 {+/-} 0.06 vs. 0.69 {+/-} 0.06, p <0.001), which was also positively associated with early-evening physical activity, but not with light exposure. MA also showed an earlier circadian phase, estimated as the midpoint of the five least active hours (L5c, 04:30 {+/-} 01:03 vs. 04:59 {+/-} 01:15, p adj. = 0.04), which was inversely associated with early-morning physical activity and late-morning light exposure. We found no differences in sleep quality between MA and LA. Our results underscore the association between movement behavior and overall circadian rhythms integrity. Importantly, these findings reinforce actigraphy as a multidimensional tool for both health research and clinical applications.
Cook, D. A.; Laack, T. A.; Pankratz, V. S.
Show abstract
Purpose: Evaluate large language models (LLMs) for scoring medical student essays, and compare various prompting techniques and models. Methods: OpenAI GPT scored 51 medical student reflection essays (15 real, 36 fabricated) using a previously-reported 6-point rubric (April-May 2025). We compared 29 prompt-model conditions by systematically varying the LLM prompts (including the persona, scoring rubric, few-shot learning [exemplars], chain-of-thought reasoning, and temperature), fine-tuning, and model (including GPT-4.1, GPT-4.1-mini, GPT-o4-mini, and GPT-4-Turbo). Outcomes were accuracy (compared with human raters, measured using single-score intraclass correlation coefficient [ICC] and mean absolute difference [MAD; zero indicates perfect agreement]), within-condition reproducibility, and cost. Results: Across all conditions, it took mean (SD) 3.73 (3.12) seconds to score 1 essay. The cost to score 100 essays was USD $0.04 for GPT-4.1-mini, $0.21 for GPT-4.1, $0.57 for GPT-4.1 with 3 exemplars, and $2.00 for fine-tuned GPT-4.1. When the one-time cost of fine-tuning was amortized across 10,000 essays, the cost for fine-tuned GPT-4.1 was $0.20 per 100. Accuracy was "almost perfect" (ICC >0.80) for 28/29 conditions (97%). Fine-tuned models were more accurate than non-fine-tuned models (MAD difference -0.24 [95% CI, -0.34, -0.14]). Conditions with exemplars were more accurate than those without (MAD difference -0.44 [CI, -0.57, -0.31]). Accuracy progressively decreased as 6, 3, 1, and 0 rubric levels were explicitly defined in the prompt (P<.001). Contrary to hypotheses, accuracies for chain-of-thought prompts and variations in temperature and persona were not significantly different from the baseline prompt. Reproducibility ICC was >0.80 for 28/29 conditions (97%). Discussion: Automated LLM essay scoring demonstrated near-perfect accuracy and reproducibility for most prompt-model conditions. Fine-tuned models and prompts with exemplars had higher accuracy but higher cost. Fine-tuned models had lower per-essay costs for larger essay volumes. For smaller volumes, non-fine-tuned GPT-4.1 provided excellent results at moderate cost. GPT-4.1-mini provided very good results at low cost.
Scanzi, D.; Taylor, D. A.; McNair, K. A.; King, R. O. C.; Braddock, C.; Corballis, P. M.
Show abstract
Electroencephalography (EEG) data are inherently contaminated by non-neuronal noise, including eye movements, muscle activity, cardiac signals, electrical interference, and technical issues such as poorly connected electrodes. Preprocessing to remove these artefacts is essential, yet the optimal method remains unclear due to the vast number of available techniques, their combinatorial use in pipelines, and adjustable parameters. Consequently, most studies adopt ad hoc preprocessing strategies based on dataset characteristics, study goals, and researcher expertise, with little justification for their choices. Such variability can influence downstream results, potentially determining whether effects are detected, and introduces risks of questionable analytical practices. Here, we present a method to objectively evaluate and compare preprocessing pipelines. Our approach uses realistically simulated signals injected into real EEG data as "ground truth", enabling the assessment of a pipelines ability to remove noise without distorting neuronal signals. This evaluation is independent of the studys main analyses, ensuring that pipeline selection does not bias results. By applying this procedure, researchers can select preprocessing strategies that maximize signal-to-noise ratio while maintaining the integrity of the neural signal, improving both reproducibility and interpretability of EEG studies. Although the data presented here focuses on processing and analysis most relevant for ERP research, the method can be flexibly expanded to other types of analyses or signals.